# Design and Implementation of 8-Bit CMOS Programmable Multi-Bit Funnel Shifter Xiong Yinglyu<sup>1,a</sup>, Udara Kalingage<sup>2</sup>, HUA Rui<sup>2</sup>, WANG Hanzhi<sup>2</sup> <sup>1</sup>Electronic and Computer Engineering (ECE) Department The Hong Kong University of Science and Technology, Hong Kong, China <sup>2</sup>Department of Electronic and Computer Engineering (ECE) The Hong Kong University of Science and Technology, Hong Kong, China a. xiongvingly@outlook.com **Keywords:** Funnel shifter, rotators; arithmetic, multiplexer, critical path delay. Abstract: This paper presents a design and implementation of 8-bit CMOS funnel shifter in a 0.25 $\mu m$ CMOS process at 2.5 V power supply, which can perform logical, rotate and arithmetic shift by selecting operands and extensions. The structure of proposed funnel shifter is mainly formed by using multiple 2 to 1 Multiplexers (MUXes). The critical propagation path delay of this proposed circuit is also analyzed and measured by injecting appropriate input vectors in this paper (assuming each product output drives 50fF extrinsic load capacitor). The total delay of this programmable funnel shifter can be optimized to 1.65 ns from post-layout simulation results and the layout area of this proposed architecture is about 7680 $\mu$ m2. #### 1. Introduction When binary numbers are used for multiplication or division, shift operations are needed to implement this kind of mathematical computation. For example, in the realm of information security, AES encryption algorithm [1] is currently most widely used standard with high security performance. In its cipher text generation process, shift operation is applied to ShiftRows transformation. Barrel shifter is one of the most commonly used shifters in computer architecture, but sometimes barrel shifter may not fast enough for given application [2]. Therefore, funnel shifter is a better choice if minimum delay needs to be achieved. ## 2. Standared Cell Library Development In this design and implementation, one XOR gate and five MUXes with different W/L ratios is developed to lower the propagation delay. Table I shows transistor sizes (both width and length) for all the MUXes and XOR gate that are used in the design of our 8-bit programmable multi-bit funnel shifter. Table 1: Sizes of Muxes and Xor gate. | Type | NMOS | | PMOS | | | |---------------|-----------------------|------|---------------|------------|--| | | $W(\mu m)$ $L(\mu m)$ | | <i>W</i> (μm) | $L(\mu m)$ | | | Logic<br>Gate | | | | | | | | | | | | | | MUX 1 | 0.90 | 0.24 | 1.68 | 0.24 | | | MUX 2 | 0.84 | 0.24 | 1.56 | 0.24 | | | MUX 3 | 0.78 | 0.24 | 1.44 | 0.24 | | | MUX 4 | 0.72 | 0.24 | 1.32 | 0.24 | | | MUX 5 | 0.66 | 0.24 | 1.20 | 0.24 | | | XOR gate | 0.96 | 0.24 | 1.80 | 0.24 | | For all MUXes and XOR gates, mirror implementation with pass transistor logic is used, which dominates the transmission gate implementation in performance. The schematic and layout of MUX (only shown MUX 1 in the figure, other MUXes are the same except for the width and length of transistors) and XOR gate are shown in Figure 1. (a) Schematic of MUX (b) Schematic of XOR gate Figure 1: Schematic and layout of MUX and XOR gate. By offering the select signal (s) a constant value 2.5 V, the output signal (op) changes as the input signal (ip2) changes. Therefore, output low-to-high propagation delay (tPLH) and output high-to-low propagation delay (tPHL) can be calculated from simulation waveform for each MUX. Similarly, by offering the XOR gate two input signal (in1) 0 and (in2) 1, tPLH and tPHL for XOR gate can also be calculated. Table 2 shows the simulated propagation delay of each MUX and XOR gate with three different extrinsic load capacitors C1=15fF, C2=8C1=120fF, and C3=64C1=960fF respectively. The propagation delay is given by $$t_{\rm p} = (t_{\rm PLH} + t_{\rm PHL}) / 2$$ (1) Table 2: Summary of Propagation Delay Driving Different Extrinsic Load Capacitors. | (a) MUX 1 | | (b) MUX 2 | | | |-----------------------|---------------------|----------------|-------------------------|--| | Load Capacitor | t <sub>p</sub> (ns) | Load Capacitor | t <sub>p</sub> (ns) | | | <u>C1</u> | 0.336 | C1 | 0.308 | | | C2 | 0.720 | C2 | 0.636 | | | C3 | 3.273 | C3 | 2.929 | | | (c) MUX 3 | | (c) MUX 4 | | | | <b>Load Capacitor</b> | t <sub>p</sub> (ns) | Load Capacitor | $t_{\rm p}({\rm ns})$ | | | <u>C1</u> | 0.321 | C1 | 0.337 | | | C2 | 0.675 | C2 | 0.722 | | | C3 | 3.098 | C3 | 3.275 | | | (d) MUX 5 | | (e) XOR gate | | | | <b>Load Capacitor</b> | t <sub>p</sub> (ns) | Load Capacitor | $t_{\rm p} ({\rm ns})$ | | | C1 | 0.359 | C1 | 0.271 | | | C2 | 0.773 | C2 | 0.741 | | | C3 | 3.460 | С3 | 3.265 | | It can be concluded that the propagation delay increases as the extrinsic load capacitance increases. Furthermore, the propagation delay increases with ratio of extrinsic delay to intrinsic delay. In a CMOS inverter the logic effort is 1 but in a complex gate as the MUX we are using in this design is always greater than 1, therefore the product of fanout and logic effort of the complex gate is high which leads to higher propagation delay when a complex gate is used. # 3. Circuit Diagram and Schematic Simulation ## 3.1. Circuit Archetecture Analysis The 8-bit programmable multi-bit funnel shifter have 8-bit input word, then generates 15-bit input word by source generator and finally selects 8-bit field from this input word as output. The overall architecture of 8-bit programmable multi-bit funnel shifter is shown in Figure 2. Table 3 shows output of the source generator. In table, there are two possible values A7 and A0 for Z7. Meanwhile, only when the shifter performs right shift operation, the output of Z7 is A7, when the shifter performs left shift operation the output of Z7 is A0. Therefore, this function can be implemented by using 2 to 1 MUX. For Z14 – Z8, there are four possible outputs, which are A6 ... A0, 0 ... 0, A7 ... A7 and A7 ... A1. Also, the output of Z14 – Z8 is A7 ... A1 only when shifter do the right shift operation, while when shifter do the left shift operation there are three possible output values. Therefore, two-level 2 to 1 MUX can be used to perform the selection of one input. Similarly, two stages of 2 to 1 MUX are also applied to design the generating part from Z6 – Z0. Combining all of these three parts, the source generator of the 8-bit programmable multi-bit funnel shifter is easy to implement. With the schematic of source generator and shifting part, the overall schematic of the 8-bit programmable multi-bit funnel shifter is shown in Figure 3. Table 4 shows the operations of 8-bit programmable multi-bit funnel shifter. Figure 2: Architecture of 8-bit programmable multi-bit funnel shifter [3]. Table 3: Source Generator of 8-Bit Programmable Multi-Bit Funnel Shifter [3]. | Shift Type | $Z_{14} - Z_8$ | $Z_7$ | $Z_6 - Z_0$ | |------------------------|------------------|-------|-----------------| | rotate right | $A_6 \dots A_0$ | $A_7$ | $A_6 \dots A_0$ | | logical shift right | $0 \dots 0$ | $A_7$ | $A_6 \dots A_0$ | | arithmetic shift right | $A_7 \dots A_7$ | $A_7$ | $A_6 \dots A_0$ | | rotate left | $A_7 \ldots A_1$ | $A_0$ | $A_7 \dots A_1$ | | logical shift left | $A_7 \dots A_1$ | $A_0$ | 0 0 | Table 4: Operations of 8-Bit Programmable Multi-Bit Funnel Shifter. | Operation | Left (S0) | Arithmetic (S1) | Shift (S2) | |------------------------|-----------|-----------------|------------| | rotate right | 0 | 0 | 0 | | logical shift right | 0 | 0 | 1 | | arithmetic shift right | 0 | 1 | 1 | | rotate left | 1 | 0 | 0 | | logical shift left | 1 | 0 | 1 | ### 3.2. Functionality and Performance Simulation The functionality of this schematic is tested in cadence. Figure 3. shows the waveform of the input and output for the 8-bit programmable multi-bit funnel shifter. The input vectors allow the shifter to do the arithmetic right shift by 6 bits operation. Inspecting the waveform for the input and output, it can be proved that the functionality of the shifter is correct. There are multiple critical delay paths in the schematic and one of them has been highlighted by using red line as shown in Figure 4. In order to excite the highlighted critical delay path, input data should go through MUX of all stages. It occurs that only when the input of arithmetic (S1) is 1, shift (S2) is 1 and (a) Waveform of input signal Figure 3: Schematic simulation of 8-bit programmable multi-bit funnel shifter. Figure 4: Schematic of 8-bit programmable multi-bit funnel shifter. left (S0) is 0. Therefore, critical delay path will be excited when the funnel shifter performs the arithmetic right shift operation. Table 5 shows all the possible cases for all inputs which can excite the critical delay path. | Input | Value | |-----------------|-------| | left (S0) | 0 | | arithmetic (S1) | 1 | | shift (S2) | 1 | | $k_3 - k_1$ | 0 / 1 | | $A_7 - A_0$ | 0 / 1 | Table 5: Possible Input Generating the Critical Delay Path. In test process, two input vectors (10000000)2 and (01111111)2 shift by 6 bits are used to get the critical propagation path delay. When these two input vectors perform the arithmetic right shift by 6 bits, the output of the funnel shifter should be (11111110)2 and (00000001)2 respectively. Thus, by comparing A7 and B1 the propagation delay can be easily identified. Simulation results are shown in Figure 5. The simulation results are based on each output product driving 50fF extrinsic load capacitor. The result shows that the propagation delay is approximately 1.21ns. Figure 5: Propagation delay with schematic level. ### 4. Layout and Post-Layout Simulation The layout of the funnel shifter is shown in Figure 6. In this implementation four metal layers are used to achieve a smaller chip area. In order to optimize the delay of the whole chip, different sizes MUXes for each stage is needed. In this layout five stages are developed and MUXes with different W/L ratios are applied so that the driver MUX of each stage is of higher W/L ratio when compared with the load MUX of the next. Designing in this manner reduces the ratio of extrinsic to intrinsic delay of each stage. At the same time the width of NMOS and PMOS was increased by $0.36~\mu m$ . As a result, the total delay decreases from 2.7~ns to 1.65~ns. The area of the layout is given by $$A = X \times Y \tag{2}$$ The area of the layout is approximately 7680 µm2 according to measurement. Figure 6: Layout of 8-bit programmable multi-bit funnel shifter. Post-layout simulation is also done to verify the functionality. The simulation result is shown in Figure 7. The simulation waveform results are based on the input vectors given by (10000000)2 and meanwhile the funnel shifter performs the arithmetic right shift by 6 bits with rise and fall time of 0.2 ns for input signal. (a) Waveform of input signal (b) Waveform of output signal Figure 7: Layout simulation of 8-bit programmable multi-bit funnel shifter. Finally, the post-layout propagation delay is measured with three different parasitic extractions: C only, R only and RC with each product output driving 50fF extrinsic load capacitor. The measured results is shown in Table 6. The delay time being tested was based on the input vector (10000000)2 and the funnel shifter perform the arithmetic right shift by 6 bits, which can excite the critical propagation delay path, because that path goes through the longest metal wires comparing with other paths. Simulation results are shown in Figure 8. which the propagation delay with RC parasitic extraction can be readily evaluated. Table 6: Delay Measurements with Different Parasitic Extractions. | Parasitic Extractions | t <sub>PLH</sub> (ns) | t <sub>PHL</sub> (ns) | t <sub>P</sub> (ns) | |-----------------------|-----------------------|-----------------------|---------------------| | C | 1.599 | 1.690 | 1.645 | | R | 1.369 | 1.430 | 1.400 | | RC | 1.607 | 1.690 | 1.649 | Figure 8: Propagation delay with layout level. Comparing the propagation delay above, it can be inspected that parasitic RC extraction have the longest delay while parasitic extraction R have the shortest delay. Difference between the delay of schematic and layout suggests that the limitation of not considering the loading effect when calculating propagation delay in schematic level simulation is overcome by post-layout simulation. Moreover, using QRC parasitic extraction, a proper delay estimation can be made by considering parasitic capacitances and resistances available in the layout. It is a known fact that the propagation delay is the sum of extrinsic delay and intrinsic delay. Therefore, once the logic effort of each gate is known a proper delay estimation of our design can be obtained easily. The performance of the shifter can be defined as Performance = $$1 / (post-layout delay \times area)$$ (3) Thus, the performance of designed and implemented funnel shifter is approximately $7.9 \times 1016$ /m2s. #### 5. Conclusions An 8-bit CMOS programmable multi-bit funnel shifter commonly used in arithmetic logic units (ALUs) for delay and area improvement compared with barrel shifter has been analyzed, designed and verified by simulation. From simulation results, the performance of proposed funnel shifter maintains relatively optimistic under different driving load situation. ### References - [1] Hodjat, Alireza, and Ingrid Verbauwhede. "A 21.54 Gbits/s fully pipelined AES processor on FPGA." 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, 2004. - [2] Vetteth, Anoop, et al. "Quantum-dot cellular automata carry-look-ahead adder and barrel shifter." IEEE emerging telecommunications technologies conference. 2002. - [3] N. H. E. Weste and D. M. Harris, CMOS VLSI Design A Circuits and Systems Perspective, 4th Edition, Pearson, 2011. - [4] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits A Design Perspective, Second Edition, Prentice Hall, 2003.